Optimal Fast Johnson-Lindenstrauss Embeddings for Large Data Sets
نویسندگان
چکیده
We introduce a new fast construction of a Johnson-Lindenstrauss matrix based on the composition of the following two embeddings: A fast construction by the second author joint with Ward [1] maps points into a space of lower, but not optimal dimension. Then a subsequent transformation by a dense matrix with independent entries reaches an optimal embedding dimension. As we show in this note, the computational cost of applying this transform simultaneously to all points in a large data set comes close to the complexity of just reading the data under only very mild restrictions on the size of the data set. Along the way, our construction also yields the least restricted Johnson-Lindenstrauss Transform of order optimal embedding dimension known to date that allows for a fast query step, that is, a fast application to an arbitrary point that is not part of the given data set. [email protected], Department of Mathematics, Technische Universität München [email protected], Department of Mathematics, Technische Universität München
منابع مشابه
Fast Nearest Neighbor Preserving Embeddings
We show an analog to the Fast Johnson-Lindenstrauss Transform for Nearest Neighbor Preserving Embeddings in `2. These are sparse, randomized embeddings that preserve the (approximate) nearest neighbors. The dimensionality of the embedding space is bounded not by the size of the embedded set n, but by its doubling dimension λ. For most large real-world datasets this will mean a considerably lowe...
متن کاملThe Fast Johnson-lindenstrauss Transform
While we omit the proof, we remark that it is constructive. Specifically, A is a linear map consisting of random projections onto subspaces of Rd. These projections can be computed by n matrix multiplications, which take time O(nkd). This is fast enough to make the Johnson-Lindenstrauss transform (JLT) a practical and widespread algorithm for dimensionality reduction, which in turn motivates th...
متن کاملFast binary embeddings, and quantized compressed sensing with structured matrices
This paper deals with two related problems, namely distance-preserving binary embeddings and quantization for compressed sensing . First, we propose fast methods to replace points from a subset X ⊂ Rn, associated with the Euclidean metric, with points in the cube {±1}m and we associate the cube with a pseudo-metric that approximates Euclidean distance among points in X . Our methods rely on qua...
متن کاملOn Low Distortion Embeddings of Statistical Distance Measures into Low Dimensional Spaces
Statistical distance measures have found wide applicability in information retrieval tasks that typically involve high dimensional datasets. In order to reduce the storage space and ensure efficient performance of queries, dimensionality reduction while preserving the inter-point similarity is highly desirable. In this paper, we investigate various statistical distance measures from the point o...
متن کاملNew bounds for circulant Johnson-Lindenstrauss embeddings
This paper analyzes circulant Johnson-Lindenstrauss (JL) embeddings which, as an important class of structured random JL embeddings, are formed by randomizing the column signs of a circulant matrix generated by a random vector. With the help of recent decoupling techniques and matrix-valued Bernstein inequalities, we obtain a new bound k = O(ǫ log(n)) for Gaussian circulant JL embeddings. Moreo...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- CoRR
دوره abs/1712.01774 شماره
صفحات -
تاریخ انتشار 2017